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Abstract — In this paper, we present new results on using orthogonal matching pursuit (OMP), to 
solve the sparse approximation problem over redundant dictionaries for complex cases (i.e., complex 
measurement vector, complex dictionary and complex additive white Gaussian noise (CAWGN)). A 
sufficient condition that OMP can recover the optimal representation of an exactly sparse signal in the 
complex cases is proposed both in noiseless and bound Gaussian noise settings. Similar to exact recovery 
condition (ERC) results in real cases, we extend them to complex case and derivate the corresponding 
ERC in the paper. It leverages this theory to show that OMP succeed for /c-sparse signal from a class 
of complex dictionary. Besides, an application with geometrical theory of diffraction (GTD) model is 
presented for complex cases. Finally, simulation experiments illustrate the validity of the theoretical 
analysis. 

1. Introduction 

Before starting to discuss our problem, we give some symbols illustration. We denote vectors and 
matrices by boldface lowercase and uppercase letters, respectively. (-) T denotes the transpose operation 
and (-) H denotes the conjugate transpose operation. Further, || • 1 1 _> refers to the £2 norm for vectors. 
R G jjmxn an( j e (Qraxn de^g a m -by-n real-valued and complex-valued matrix, and let 
and be real and imaginary parts, respectively For a vector x = [x\,X2, ■ ■ ■ ,x n ] T € R n , let S = 
{i : \x{\ / 0} be the support of x and let *(5) be the set of atoms of * corresponding to the support S 
and x is said to be fc-sparse if the cardinality of the set S is no more than k (i.e., \S\ < k). 

Recovery of a high-dimensional sparse signal from a small number of noisy linear measurements, is 
a fundamental problem in compressive sensing (CS) community. The linear measurement model can be 
formulated as: 

y = *x + n (1) 
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where the observation y € M. m , the matrix * £ M mxn , and the measurement error n 6 M m . Suppose 
\I> = [^i , -02 , • " " jY'n]* where f/'j denotes the i-th column of Throughout the paper the matrix * 
and its i-th column are called dictionary and the i-th atom of respectively. The mutual incoherence 
property (MIP) of dictionary * is defined as in [1] 

M*) = max (2) 

l^jsjn ||V>i|| 2 ■ \\i>jh 

CS is to reconstruct the unknown vector x£l™ based on y and * . A setting that is of significant interest 
and challenge is when the dimension n of the signal is much larger than the number of measurements 
m. This problem has received much attention in a number of fields including electrical engineering [2], 
imaging process [3], statistics and applied mathematics [4], recently. 

To solve an undetermined system of linear equations in the above form (1), in previous literature, 
many authors use the OMP algorithm to recover the support of the fc-sparse signal. Compared with other 
alternative methods (such as [5-8]), a major advantage of the OMP is its low computation complexity. 
This method has been used for signal recovery and approximation [9-12]. Support recovery has been 
considered in the noiseless case by Tropp in [10], where it is shown that < is a sufficient 

condition for recovering a fc-sparse x exactly in the noiseless case. Results in [13] imply that this 
condition is in fact sharp. However, to the author's knowledge, exact recovery condition (ERC) results 
w.r.t. OMP are derived for real measurement and dictionary. When observation y and dictionary * as well 
as noise vector n are complex, there is no corresponding theory. However, there are many applications in 
complex settings. Hence, as an extension of the previous theoretical work, we assume that the observation 
vector y and dictionary * are complex. And in the premise we further consider the measurement noise 
are also complex in the paper. It is the difference between our work and the others and it is also our 
major contribution in the paper. 

According to the above description, with slight abusement of notation, we can directly extend the 
model (1) to complex value cases as follows. 

y = *x + n (3) 

where the observation y, the matrix and the measurement errors n are the same dimension as in 
model (1), respectively. The problem is reformulated into reconstruct the unknown vector xeC based 
on complex vector y and complex dictionary 

The paper is organized as follows. In section 2, we briefly present the classical OMP algorithm to 
solve the model (1). We analyze the OMP algorithm ERC for complex value cases in section 3. And 
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a geometric theory of diffraction (GTD) parametric model is proposed for complex setting practical 
application in the section 4. Finally, some conclusions and further work are provided in section 5. 
2. The OMP Algorithm 

Under the condition (4), the sparse solution can be obtained using OMP algorithm directly. The sparse 
solution is given by iteratively building up the approximation. The vector y is approximated by a linear 
combination of a few atoms in dictionary * , where the active set of atoms is built column by column, 
in a greedy fashion. At each iteration, a new atom that best correlates with the current residual is added 
to the active set. Here we give a detailed description of the OMP algorithm [14]. 

We assume that the atoms are normalized, i.e., 1 1 V» 1 1 2 = 1 , for i = 1,2, ■■■ ,n. We denote the 
support of x by c C {1, 2, • • • , n}, which is defined as the set of indices corresponding to the nonzero 
components of x. *(c) denotes the matrix formed by picking the atoms of * corresponding to indices 
in c. In this paper, we use ipi to denote the i-th atom of * in (4). Similarly, we call ipi a correct atom if 
the corresponding Xi ^ and call tpi an incorrect atom otherwise. With slight abuse of notation, we use 
\I/(c) to denote both the subset of atoms and the corresponding submatrix of The OMP algorithm 
can be stated as follows in detail (i.e., Algorithm 1). 

The OMP is a stepwise forward selection algorithm and is easy to implement. A key component of it 
is the stopping rule which depends on the noise structure. In the noiseless case the natural stopping rule 
is r.j = 0. That is, the algorithm stops whenever = is achieved. In this paper, both noiseless case 
and the case of Gaussian noise with n« ~ CN(0, a 2 ) are considered. The stopping rule for each case and 
the properties of the resulting procedure will be discussed in section 3. 

Remark 1: OMP algorithm starting from x = 0. It iteratively constructs a fc-term approximant by 
maintaining a set of active atoms (initially empty), and expanding the set by one additional atom at each 
iteration. The atom chosen at each stage maximally reduces the residual £2 error in approximating y 
from the currently active atoms. After constructing an approximant including the new atom, the residual 
£2 error is evaluated. If it falls below a specified threshold, the algorithm terminates. It requires 0(nmk) 
flops in total. 

Remark 2: In fact, one observes that the unknown sparse vector x is composed of two effective parts 
which are the support and the non-zero values over the support. Once the support of x is found via OMP 
algorithm, the non-zero values of x are easily determined by least squares (LS) method. 

3 Performance Analysis 

The performance of the OMP algorithm depends on the probability of selecting a correct atom at each 
step. The probability is affected by the degree of collinearity among the variables and the noise structure. 
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Algorithm 1 : OMP Algorithm 
Require: 

The measurement vector y; 

The dictionary * ; 

the error threshold e; 

Ensure: 

1: Initialize the residual ro = y and the set of selected atom \&(co) = 4>. Let iteration counter i = 1. 
2: Find the variable ijj t . that solves the maximization problem 

U = argmax|V>^ri_i| 

and add the variable ipa to the set of selected variables. Update q = Cj_i 1J{^}- 
3: Let Pj = * (cj)( 1 ^ (ci) H * (ci)) -1 * (ci) H denote the projection onto the linear space spanned by the 

elements of Update r-j = (I — Pj)y. 

4: If the stopping condition is achieved (e.g., ||rj||2 < e), go to 5. Otherwise, set i = i + 1 and go back 

to 2 until reaching the given threshold or maximum iterative times. 
5: Calculate the vector x with LS method. 
6: Return x. 



Ours OMP algorithm analysis will be carried out using the mutual incoherence /x(-) in (2). Noting that 
the atoms are normalized and hence it can be rewritten by 

M*) = .max |Vf-^| (4) 

To gain insight on the OMP algorithm and to illustrate the main ideas behind the proofs, it is instructive 
to provide some technical analysis of the algorithm. The analysis sheds light on how and when the OMP 
algorithm works properly. However, we must point out that the ERC in noiseless has been verified by 
Troop in 2004 for real case in [10]. Meanwhile, T. Cai et al. has investigated the properties of the OMP 
algorithm for bounded noise cases as well as the Gaussian noise case in [13]. In this section, we extend 
the results to complex case. Meanwhile, we also derive the ERC for CAWGN settings. Moreover, it 
proposes the restrict isometry property (RIP) based bound of the OMP algorithm guaranteeing the exact 
reconstruction of sparse signals in [14], but it is beyond the scope of our discussion in the paper. 
3.1 ERC in the noiseless settings 

ERC in noiseless can be posed as a theorem for the success of the OMP as bellow. 
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Theorem 1: For a system of linear equations y = *x G £mxn ^ f u ii- ra nk with m < n), if a 
solution x exists obeying 



l x llo < 



1 



1 + 



1 



(5) 



OMP with threshold parameter eo = is guaranteed to find it exactly, where ||x||o denotes the non-zero 
entries in x. We give a proof to Theorem 1. It is similar to (but not the same as) the Theorem 4.3 shown 
in [15]. Here we assume that the dictionary is complex. 

Proof: Without loss of generality, we suppose that the sparsest solution of the linear system is such 
that all its k non-zero entries are at the beginning of the vector, in decreasing order of the values \xj\. 
Thus, 

k 

y = * X = X ^Pt (6) 
t=l 

At the first step (i = 0) of the algorithm r-j = ro = y, and the set of computed errors from the sweep 
step are given by 



e(j) = min 



y\\l - «y) 2 > o 



y 1 1 2 — WJ 112 \Yj 



( 7 ) 

To get (8), we utilize the equation Zj = ipfy and suppose \\1piW2 = 1, for « = 1, 2, ■ ■ ■ ,n. Thus, for 
the first step to choose one of the first k entries in the vector (and thus do well), we must require that 
all i > k, IV'fyl > IV'i y| i s satisfied, and substitute it in (7), this requirement transforms into 



^2 xtipi^t 



t=i 



> 



^2 xtipfipt 



t=i 



(8) 



According to (8), we construct a lower bound for the left-hand-side, an upper-bound for the right- 



hand-side, and then pose the above requirement again. For the left-hand-side we have 



xtipi^t 



t=i 



> |x 1 |(l- / u(*)(fc-l)) 



(9) 



In (9), we exploit triangle inequality theorem and mutual incoherence definition in (4) as well as the 

decreasing order of the values \xj\. Similarly, the right-hand-side term in (8) is bounded by 

k 



^2 x ^f^ 



< k ■ \xi\ ■ 



(10) 



For the derivation of (9) and (10), please refer to Appendix A. Using these two bounds plugged into 
the inequality (8), we obtain 



J2 xtipi^t 



>H(i -/*(#)(* -i)) 

k 



> |xi|/i(*)A; > 



t=i 



en) 
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For the second inequality in (11), we exploit the inequation (5). And then it leads to 

1 + /*(*) > 2/x(*)fc 

Or equivalently 

fc< K 1 + M^) (12) 

which is exactly the condition of sparsity above. This condition guarantees the success of the first stage 
of the algorithm, which imply that the chosen element must be in the correct support of the sparsest 
decomposition. 

■ 

3.2 ERC in the CAWGN Settings 

Note that the S = {i : \xi\ / 0}, and the set of significant or "correct" atoms is ^(S) = {ipi : i € S}. 
At each step of the OMP algorithm, the residual vector is projected onto the space spanned by the selected 
atoms (columns of Suppose the algorithm selects the correct atoms at the first t steps and the set of 
all selected atoms at the current step is ^(q). Then contains t atoms and C *(<S*). Recall 

that 

P t = *(ct) (*(ct)"*(c t )) _1 *(c t ) H (13) 

is the projection operator onto the linear space spanned by the elements of Then the residual 

after t steps can be written as 

||r t || 2 = (I-Pt)y 

= (I-P t )*x + (I-P t )n (14) 

A 

= s t + n t 

where s* = (I — P^^x is the signal part of the residual and nj = (I — Pt)n is the noise part of the 

residual. 

Let 

a t ,i = max {|V%|} (15) 

V>G*(T) 

a t ,2 = max {|^s t |} (16) 
Ve*/*(T) 



And 



(3 t = max{|^n t |} (17) 
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It is clear that in order for OMP to select a correct variable at this step, it is necessary to have 

max ■ {\4> H r t \} > max {|V H r t |} (18) 

A sufficient condition is a^i — atp > 2/3. This is because a^i — at, 2 > 2/3 implies 

max {\iP H r t \} > a t ,i - Pt > o t , 2 + ft > max {|^ H r t |} (19) 
V>e*(T) V6*/*(T) 

However, exploiting Lemma 4 and Lemma 5 results in [13], we have the following results: 

Lemma 1: The minimum eigenvalue of * (S) H * (S) is less than or equal to the minimum eigenvalue 
of *fr(ut) H (l — T*t)^(ut)- And a sufficient condition for selecting a correct atom at the current step is 
||x(ut)|| 2 > i^pl^W *("t) - *(' s ')/*( c t) denote the set of significant atoms that are yet to be 
selected and x(ut) denotes the corresponding linear coefficients. 

The complex Gaussian noise case is of particular interest in this paper. To simplify deviation, we 
present an important result on bound noise cases given in [13]. 

Lemma 2: Suppose 1 1 rx| 1 2 < &2 and /u(vl/) < 2fe~r- Then the OMP algorithm with the stopping rule 
|| r i||2 < &2 recovers exactly the true subset of correct atoms if all the nonzero coefficients x\ 

satisfy \x{\ > (i_( 2 fc_i) M (*))- 

The results in Lemma 2 can be applied to the case where noise is Gaussian. This is due to the fact 
that Gaussian noise is "essentially bounded" as it proved in [16]. Although Lemma 2 is derived for the 
real cases in [13], it also holds in complex AWGN cases. The proof is in Appendix B. Suppose the noise 
vector follows complex Gaussian distribution, i.e., n ~ CN(0,a 2 I m ) and each m is i.i.d. Define the 
following bounded set 



Bi = jn : || n. || 2 < a^J (m + ^2m ■ ln(2mj) j 



(20) 



Then we have the following result. 

Theorem 2: Suppose noise vector in (4) n ~ CN(0, a 2 I m ), entries of noise are i.i.d, and real part as 
well as imaginary part in are also i.i.d. Then the Gaussian error satisfies 

P(n€Bi)>l . 1 (21) 

V 7 " 2^/tt • ln(2m) 

The proof is in Appendix C. 

Let the bound noise be a different form, then it could directly get a different result in Corollary 1. 
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Corollary 1: Suppose noise vector in (4) n ~ CN(0,a 2 I m ), entries of noise are Ltd., real part as 
well as imaginary part in are also i.i.d, and if E$2 = |n : 1 1 n 1 1 2 < a^Jm+ \ ■ \Jm ■ ln(m) |. Then 



the Gaussian error satisfies 



P(nGB 2 )>l- J— — (22) 
W 7r • m(m) 



The proof is in Appendix D. 

Lemma 2 suggests that one can apply the results obtained for the bounded error case to solve the 
complex Gaussian noise problem. We directly apply the results for I2 bounded noise case (Lemma 2) 
and Theorem 2 to get the ERC in CAWGN cases. 

Theorem 3: Suppose n ~ CN(0, a 2 I m ), < -^tt, an( ^ a ^ tne nonzero coefficients Xi satisfy 



2a^j(m + ^2m ■ m(2r^j) 



W " 1 - (2k - !>(*) (23) 

Then OMP algorithm with the stopping rule ||rj|| < a^J (m + ^J2m ■ ln(2m)^ can select the true subset 
with probability at least 1 , 1 = . 

K ' V 3 ln(2m) 

Meanwhile, with the results in Lemma 2 and Corollary 1 we can obtain a different ERC in CAWGN 
cases. 

Theorem 4: Suppose n ~ CN(0,a 2 I m ), /(/(*) < ^4 and all the nonzero coefficients Xi satisfy 

2a + I • y/m ■ ln(m) 

- i-(2fe-iH*) (24) 

Then OMP algorithm with the stopping rule ||rj|| < a^J m+ \ ■ m ■ ln(m) selects the true subset 



^f(S) with probability at least 1 — y - ln( - m ^ . 
We omit the proof Theorem 3 and Theorem 4 because it is obvious and easy. 

However, before we end the theoretical analysis, we should mention that all the results derived in this 
paper are worst-case ones, implying that the kind of guarantees we obtain are over-pessimistic, as they 
are supposed to hold for all signals and for all possible supports of a given cardinality. Besides, compared 
with the ERC in [13], the derived ERC recovery success probability is larger. The mainly reason is due to 
in complex cases, the measurement vector, dictionary, the high dimension sparse unknown vector as well 
as noise vector are assumed complex. If all of them is real, the ERC also reduces to those results in [13]. 
However, the dictionary MIP is a more fundamental role and the constraint relationship //(*)< is 
unchanged. 

4. An Application for Complex Case 
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In this section, we present an exact application of the OMP algorithm for complex case via GTD model 
which is widely used by radar imaging community [17]. The GTD model is proposed in the literature 
[18] and [19]. We give the mathematical description about the model in radar imaging, firstly. 

4.1 Simulation Application Formulation 

In the paper, ideal point scattering mechanism is considered. It assumes that the measured scattering 
data from d scattering centers at M sampled frequency points f m (m = 0, 1, ■ ■ ■ , M — 1) and one aspect 
angle are given by [20] 

y™ = a p ' exp i -j—f™ r p \ ( 25 ) 
p=i ' 

The model parameters {A p ,r p } p=1 characterize the d individual scattering centers intensity and the 
distance from reference center on the target to scatterers, respectively. A p is a complex scalar providing 
the magnitude. f m is the m-th measurement frequency, s is the speed of light in free space. Using 
equation r p = -f-, the model (25) can be formulated as the compact matrix form in noise setting, 



y = *x + n (26) 

where y G C mxl is the observation vector in frequency; * G c mxn is the transform matrix with the 
Z-th row and p-th column element is 

[*],,p = exp{-j27Tf l r p } (27) 

x G C nxl corresponds to magnitude of the scattering centerer. u G c mxl is stochastic measurement 
noise vector; assuming that n ~ CN(0, a 2 l m ) is a vector of i.i.d random variables. Note that all the 
columns are normalized (i.e., ||V>i|| = 1 for i = 1, 2, • • ■ , n), and the measurement errors n G C mxl . In 
(26), obviously, it is a problem to recover a high-dimensional sparse signal based on a small number of 
linear measurements, in noise settings. 

4.2 Simulation Results 

In this subsection, 10,000 trails Monte Carlo simulation has been done for confirming the previous 
theoretical analysis in section 3 via an exact application introduced in subsection 4.1. In the simulation, 
the measured frequency band ranges from 1GHz to 1.3GHz in L band, where the start frequency is 
/o = 1GHz and frequency sampling interval is 10MHz. Then 30 complex frequency samples can be 
measured. Furthermore, we assume the target is 5m length and composed of one to five scatter points 
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Atom NO. 

Fig. 2: inter-atom mutual inference property (2D) 



located at 0.3m, 0.85m, 2.25m, 4.0m and 4.75m to target front-end respectively. The measured samples 
in frequency are contaminated by CAWGN with SNR = 20dB and noiseless, respectively. 

The dictionary mutual incoherence coefficient shown in Fig. 1 and Fig. 2 are calculated by formula (4). 
Fig. 1 shows three-dimension plot among all atoms of Fig. 2 is two-dimension situation respectively. 
We can see that when the interval between two atoms is smaller, the coherence is larger. Besides, once 
determining the support of sparse vector x with OMP, we further calculate the non-zero values over 
this support of x with LS method. Hence, Fig. 3 and Fig. 4 also present the cumulative distribute error 
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Fig. 3: CDE vs. different /c-sparse (noiseless) 
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Fig. 4: CDE vs. different £;-sparse (noiselessXSWi? = 20dB) 



(CDE) w.r.t different fc-sparse settings (i.e., k varies from 1 to 5 in simulation) both in noiseless and 
SNR = 20dB, respectively. Finally, Fig. 5 presents the simulation results on recovery probability w.r.t 
different /c-sparse settings both in noiseless and SNR = 20dB. As Fig. 2 implies, we give some remarks 
about OMP algorithm as bellow. 

Remark 4: Fig. 3 and Fig. 4 show the CDE are a monotonic increasing relative to sparse degree k. 
It is easy to understand, because with sparse degree k increases from 1 to 5, it is harder and harder to 
make //(VP) satisfy with inequality constraint (12). In Fig. 3, we consider noiseless case, while Fig. 4 
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Fig. 5: 
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success recover probability w.r.t different fc-sparse 



presents CDE for the case SNR = 2MB. 

Remark 5: If only one scatter point is in x, it is to recover a 1 -sparse vector. Obviously, the recover 
success probability must be 100% in this setting, because x is a 1-sparse, dictionary mutual incoherence 
among all atoms < 1 always holds. Even in the noise settings with SNR = 2MB, hence it can 

select correct support of x via the OMP algorithm. Meanwhile, when x is 2-sparse, it is also recovered 
with the probability 100% no matter in noiseless or SNR = 20dB settings. As a matter of fact, it can 
be recover because inter-atom incoherence /u(vl/) < 1/3 is always satisfied only if the interval in support 
is large enough so that the two selected atoms mutual incoherence satisfy with fi(^) < 1/3. Obviously, 
the incoherence condition is satisfied (see from Fig. 2). 

5. Conclusion and Future Work 

In this paper, some new results on using OMP algorithm to solve the sparse approximation problem 
over redundant dictionaries with complex cases are presented. With the mutual incoherence property 
to quantify inner-atom interference (IAI) level in dictionary, it provides a sufficient condition under 
which OMP can recover the optimal representation of an exactly sparse signal in the complex settings. 
It leverages this theory that OMP can succeed for fc-sparse signal from a class of dictionary with high 
probability. More importantly, the new proposed ERC in complex cases completes the existed ERC of 
OMP. It makes OMP ERC become more complete. In the end, we confirm the correction of theoretical 
analysis via simulation experiments. 

Some future work will be addressed. First, we only consider ERC of classical OMP algorithm in 
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complex cases and not considering IAI yet (see from Fig. 1 and Fig. 2). In fact, if interval of a two 
non-zero elements in x is small even the two elements is adjacent each other, it cannot recover with high 
probability, which is mainly caused by IAI. Although there are much literatures about imitate IAI with 
sensing dictionary such as [21] and [22], all of them are for real dictionary. Similarly, how to imitate 
IAI in complex case is worthy of research. Second, we just assume that there is only one scatter type, 
but in fact there are a few of scatter types such as [19] provided. If all scatter types are considered, the 
dictionary has a large scale. Hence, in this situation, how to deduce dictionary dimension (in other words, 
how to reduce computation cost) is also a problem. Third, we use LS method to recover non-zeros values 
in sparse vector x. How to reduce recover error of non-zeros value is also a question to consider in noise 
settings. Besides, OMP works correctly for a fixed signal and measurement matrix with high probability, 
and so it must fail for some sparse signals and matrices [23]. While in complex settings it is also having 
this problem indeed, as far as our known. 

Acknowledgment 

The authors would like to thank the anonymous reviews for their comments that help to improve the 
quality of the paper. This research was supported by the National Natural Science Foundation of China 
(NSFC) under Grant 61172140, and '985' key projects for excellent teaching team supporting (post- 
graduate) under Grant A1098522-02. Yipeng Liu is supported by FWO PhD/postdoc grant: G.0108.11 
(Compressed Sensing). 

Appendix A 
Proof of (10) and (11) 

In (9), for the left-hand-side we have 

k 



t=i 



> \ x l\ - X] Ml^lVi 
t=2 

k 



>M-5>K*) (28) 

t=2 

> |xi|(l -//(*)(£ -1)) 
Here we have exploited the definition of the mutual-coherence (4), and the descending ordering of the 
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values \xj\. Similarly, In (10), the left-hand-side term is bounded by 



^2 xtipi^t 



(29) 



< j> t iivf^i 
t=i 

k 

t=l 

< \xi\^)k 



Appendix B 
Proof of Lemma 2 

It follows from the assumption 1 1 n 1 1 2 < ^2 that 

||n t || 2 < ||(I-Pt)n|| 2 < ||n|| 2 < b 2 
Let tpi be any column of Then, 

|V^n t | < 1 1 V* 1 1 2 1 1 n* 1 1 2 < b 2 

This means [3t < b 2 . It follows from Lemma 1 that for any t < k. ||x(ut)||2 > i_(2fc— i m pli es 
that a correct atom will be selected at this step. So \xi\2 > prp^rfw^) f° r ai l nonzero coefficients Xj 
ensures that all the k correct atoms will be selected in the first k steps. 

Let us now turn to the stopping rule. Let denote the projection onto the linear space spanned by 
\&(T). Then ||(S — Pfc)n|| 2 < 1 1 n 1 1 2 < b 2 - When all the k correct atoms are selected, the l 2 norm of the 
residual will be less than b 2 . Hence the algorithm stops. It remains to be shown that the OMP algorithm 
does not stop early. 

Suppose the algorithm has run t steps for some t < k. We will verify that ||r t ||2 > b 2 . So OMP does 
not stop at the current step. Again, let *&(u t ) denote the set of unselected but correct variable and x(« t ) 
denote the corresponding coefficients. Note that 

||r t || 2 = ||(I-P t )*x+(I-P t )n 2 ||2 

> ||(I -P t )*x|| 2 - ||(I -P t )n|| 2 (30) 

> ||(I-P t )*(« t )x(u t )|| 2 -&2 
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It follows from Lemma 1 that 

||(I-P t )tf«)x(u t )||2 > A min ||x(n t )|| 2 

>(l-( fc -l)^)) (1 _ 2(fc % MW) (31) 
> 26 2 

So, 

||r t || 2 > ||(I - P t )*(«t)x(«t)|| 2 " &2 > ^2 

and the lemma is proved. 

Appendix C 
Proof of Theorem 2 

Without loss of generality, let fc-th element in noise vector n be nj. Then is independent identical 
distribute and n k ~ CN(0,a 2 ). 

As real part and imaginary part in n*. are also Lid, then we have, 

K{n fe } ~ N (o, 



a 2 



and 



Equivalently, we further have, 



and 



9{n fc } ~ AT ( 0, - 



^ • K{n fc } - 7V(0, 1) 
a 



^•3{n fc }~iV(0,l) 



Considering the independence between real part and imaginary part in n, we have the following 
equations 

||n||| = P{n}||| + ||Q{n}||| 

Then, we can further get, 

v A ^ II l|2 
1 2m — ~n \\ n 2 



2 / m 



a 2 



^3t{n fc } 2 + J]9{n fe } 2 
\k=l k=l I 

k=l V J k=l V 



(32) 
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Obviously, 

Y 2m ~ X 2 (2m) 

It follows from Lemma 4 in [24] that for any A > and using the technical inequality relationship 
ln(l + A) < A, it holds when A > — 1 but A / we can get, 

P (Y 2m > (1 + A)2m) 

11 1 / \\ m 

= wfe e>IP { h, ( (1 + A)e ")"'} < 33 > 
' exp{-m(A-ln(l + A))} 



Hence, 



< 



\\f2wm 
1 



1.2 ^ 

n 2 < 



(2m + 2^2m ■ ln(2m) 

= P (*2m < (2m + 2v/2m-ln(2m))) 
= 1 - P (y 2m > (2m + 2v/2m-ln(2m))) 
= 1 - P (y 2m > 2m (l + ^m- 1 -ln(2m))) 



1 - P (*2m > 2m(l + A)) 
1 



\\f2wm 

Finally, we substitute A = i/2m _1 • ln(2m) into the inequality above. It becomes 
P ^||n|| 2 < cr^J (m + y^nz • ln(2m) ) j > 1 - ^7= 



ln(2m) 



(34) 



Hence, the results of theorem 1 holds. 
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Appendix D 
Proof of Corollary 1 

For the corollary, similar to the proof of the theorem 2, the procedure is as follows, 

r 2 



P < y (2m + y/m ■ ln(m)^ 

= P (Y 2m < (2m + v'm-ln(m))) 

= 1 - P (y 2m > (2m + Vm-ln(m))) 

= 1 - P (y 2m > 2m ^1 + X - ■ y/m- 1 -ln(m)^ 



(35) 



1 - P (12m > 2m(l + A)) 
1 



> 1 



\yj2irm 

Similarly, substitute A = \y/m~ x ■ ln(m) into the inequality above, we can get the results: 



P 



|n|| 2 < <t\J (m + y/2m ■ ln(2m)^ > 1 - 



y/ir ln(m) 
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